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Abstract 

We study the problem of detecting a random walk on a graph from a sequence of noisy measurements 
at every node. There are two hypotheses; either every observation is just meaningless zero-mean Gaussian 
noise, or at each time step exactly one node has an elevated mean, with its location following a random 
walk on the graph over time. We want to exploit knowledge of the graph structure and random walk 
parameters (specihed by a Markov chain transition matrix) to detect a possibly very weak signal. The 
optimal detector is easily derived, and we focus on the harder problem of characterizing its performance 
through the (type-II) error exponent: the decay rate of the miss probability under a false alarm constraint. 
The expression for the error exponent resembles the free energy of a spin glass in statistical physics, and 
we borrow techniques from that held to develop a lower bound. Our fully rigorous analysis uses large 
deviations theory to show that the lower bound exhibits a phase transition; strong performance is only 
guaranteed when the signal-to-noise ratio exceeds twice the entropy rate of the random walk. Monte 
Carlo simulations show that the lower bound fully captures the behavior of the true exponent. 

Index Terms 

Detecting random walks, combinatorial testing, error exponent, product of random matrices, Lya¬ 
punov exponent, random energy model, spin glasses, large deviations theory 


I. Introduction 

Suppose we wish to make sense of a sequence of observations from nodes in a graph. The observations 
form a spatiotemporal matrix, where each column contains the measurements at all nodes at a particular 
snapshot in time. As illustrated in Figure [T] we need to distinguish between two hypotheses: (a) every 
observation is just meaningless zero-mean Gaussian noise, or (b) an agent is undergoing a random walk 
on the graph and the measurement at its location at each time has an elevated mean. We do not know 
the exact path of the agent, but we do know its dynamics: with the graph structure assumed known, the 
agent’s movements follow a well-defined finite-state Markov chain. In effect, we would like to exploit 
our knowledge of the graph structure (or the Markov chain) to help detect a possibly very weak signal. 

In practice, this problem can arise from the detection of an intruder via a sensor network; the motion 
of a potential intruder might be modeled as a random walk on a graph representing the network, and one 
is tasked with testing the hypothesis that an intruder is currently present based on noisy measurements 
from each sensor. This kind of model has also been used in the detection of frequency-hopping or other 
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T-Lq. 
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Fig. 1. Illustration of the two hypotheses under consideration. Each column of the observation matrix shows the measurements 
at all nodes at a particular point in time. The null hypothesis T-Lq (top) is that all of the measurements are just noise. The 
alternate hypothesis Hi (bottom) is that a single node has an elevated mean at each time, and that node is chosen by a random 
walk. Here, we have illustrated a random walk on a line graph, but in this paper we consider the general case of any finite-state 
Markov chain. 


highly oscillatory signals ||T]. More generally, it can he interpreted as the detection of a hidden Markov 
process, a problem with many applications (see, e.g., B-|[6l.) 

The task we have is a kind of combinatorial testing problem iTTlI- lfTOl . in that there is an exponentially 
large number of paths that could be anomalous. Thus, the alternative hypothesis is in fact a composite 
of an exponentially large number of simple hypotheses. Despite this complexity, the optimal Neyman- 
Pearson detector in our problem turns out to be easy to derive and computationally tractable. However, 
its performance is not so simple to characterize. 

We will use the (type-II) error exponent, which measures the rate of decay of the miss detection 
probability when the false alarm probability is held fixed, as the performance metric. One should expect 
it to depend on the signal-to-noise ratio (SNR) and the degree to which the Markov dynamics restrict the 
paths of the agent. If the SNR is too low, the true path will not be very different from the noise. But if 
the number of potential paths is very small, it may be easy to rule out false alarms, and performance will 
be better than when the number is very high. As the main focus of this paper, we will characterize the 
error exponent of the optimal detector and quantify the above intuition. We do this by deriving a fully 
rigorous lower bound to the error exponent, using ideas borrowed from statistical physics |[m - |[T4]| . 
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A. Related and Prior Work 

Detecting a continuous Gauss-Markov process in Gaussian noise is a classical signal processing problem 
that has been extensively studied (see, e.g., ifTSl . ifT^ .l Hypothesis testing that tries to distinguish between 
two different finite-state Markov chains based on noiseless realizations is also well-understood ifTTl - lfTOl . 
In this work, we focus on the related problem of detecting random walks on directed and weighted 
graphs (which are finite state Markov chains) based on noisy observations that are perturbed by additive 
Gaussian noise. These observations neither satisfy the Markov property nor are jointly Gaussian, making 
the problem a more difficult one. 

There is some prior work on detecting hidden Markov processes such as the one we consider in this 
paper. The structure of the optimal detector for a finite-state Markov chain in noise was addressed in |[2l . 
EOl . We are interested in going further and characterizing the asymptotic performance of the optimal 
detector by computing the error exponent. For the Gauss-Markov case, a closed-form expression for the 
error exponent was derived by Sung et al. Q using a state space representation. Our problem turns out 
to be more challenging. The error exponent, we shall see, is equal to the top Lyapunov exponent of 
the product of a sequence of random matrices ll^ . |[22]| . a problem known to be difficult E^ . Leong 
et al. m described a numerical technique to approximately compute the error exponent for detecting 
a two-state Markov chain in noise by discretizing a certain integral equation. Unfortunately, numerical 
solutions based on discretization become computationally intractable for general Markov chains with a 
large number of states, the case we address in this paper. In principle, one can always use Monte Carlo 
simulations to estimate the Lyapunov exponent (and thus the error exponent.) However, they will not 
easily provide insights relating the error exponents to the SNR and the Markov chain structures. 

Finally, we note that our problem is closely related to the general task of detecting nonzero-mean 
components of a Gaussian random vector @, ifTOl . E^ . Addario-Berry et al. characterized the perfor¬ 
mance in a very general setting Q, bounding the Bayesian risk of the test; but in that work all of the 
nonzero-mean support sets under test are equiprobable and there is no Markov structure. Arias-Castro 
et al. considered a problem similar to ours where a path on a graph has elevated mean while all other 
nodes are zero-mean Gaussians l|8l; instead of a time series, they considered a single snapshot in the 
asymptotic regime of very large graphs. 

In this paper, we consider general graphs (or Markov chains) with an arbitrary number of nodes. 
Drawing upon techniques originally developed in statistical physics lfm - |[T4]| . we compute a lower bound 
on the error exponent that appears in practice to be quite sharp. The lower bound exhibits a phase transition 
at a certain threshold SNR, separating the detectable and undetectable regimes. Some of these results were 
previously presented in E5l . 1261, but we only justified fhem through nonrigorous arguments common 
in the statistical physics literature. In this paper we use large deviations theory ETl . E8l to provide a 
fully rigorous derivation for the lower bound. 

B. Contributions 

We will precisely formulate the hypothesis testing problem in Section ini and introduce and motivate 
the error exponent as the performance metric. The main contributions of the paper will follow: 

(1) In Section |IIIJ we prove that the error exponent for this problem is well-defined and equal fo fhe 
asympfofic Kullback-Leibler (KL) divergence rate of the two hypotheses. We do this by generalizing 
the standard Chernoff-Stein lemma ESi . which gives the error exponent for independent and identically 
distributed (i.i.d.) hypotheses, to the Markovian case. 

(2) Later in Section |III1 we develop upper and lower bounds for the error exponent. The upper bound is 
a simple genie bound. The lower bound is derived borrowing techniques from statistical physics—it is 
related to the free energy density of a new “spin glass” model |[m - |[T4]| . E9l . 

(3) We show how to explicitly compute the statistical physics-based lower bound. A rigorous proof of the 
expression is technical, so we present our results in two steps: first, we provide in Section |IV] a high-level 
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overview of our approach, emphasizing ideas and intuitions rather than rigor. Our discussions there also 


serve as a roadmap to the various results in Section |Vl where we use large deviations theory to rigorously 
derive an expression for the lower hound and show how to compute it parametrically. The lower hound 


we derive exhibits a phase transition at an SNR equal to twice the entropy rate of the Markov chain. 
Below the threshold SNR, the hound is exactly equal to zero, indicating poor performance; above the 
threshold, there is rapid improvement in performance as the SNR increases. 

(4) In Section IV-Dl we compare the true error exponent (as estimated via Monte Carlo simulations) 
to the lower bound and find that the bound fully captures its behavior, which appears to undergo a 
smoothed version of the phase transition at the predicted threshold. In the detectable SNR regime (above 
the threshold), our bound is also far better than an alternative bound obtained by ignoring the Markov 
structure, especially when the graph size is large. 

We offer some concluding remarks in Section |Vll 


II. Problem Formulation 


We consider testing the two hypotheses illustrated in Figure [T] The data form a matrix = [ym,n] 


with 1 < m < M and 1 < n < A^, where M is the number of nodes in the graph and N is the number 
of observation times. As we allow the graph to be directed and weighted, the dynamics of an agent 
following a random walk on the graph can model any finite-state Markov chain. The two hypotheses are 
as follows: 


no : ym,n ^f{0, 1 ) 


: s = (si, S 2 ,..., Sat) ~ Markov(P) 



where P is the known transition matrix of an irreducible and aperiodic M-state Markov chain [so that 
Pr(s „+1 = j\sn = i)= Pij, the ijth entry of P]. 

Under the null hypothesis T-Lq, the measurements are just i.i.d. zero-mean standard Gaussian noise. 
Under the alternate hypothesis T-Li, there is a sequence of states s = (si, S2, ■ ■ ■, sn) £ {Ij • • • j M}^ 
produced by a Markov chain with transition matrix P, and we assume that si is drawn from its unique 
stationary distribution tt. By the Perron-Frobenius theorem for irreducible matrices Il30l . the elements 
of TT are all positive, meaning each state has a positive probability of being initially chosen. Given the 
state sequence s, the entries of the data matrix are still independent Gaussian random variables. The 
difference is just that, in each column n the Gaussian random variable at the s^th entry has an elevated 
mean fi. This can be interpreted as the “signature” or “evidence” left behind by the agent. The variance 
in both hypotheses is set to 1 without loss of generality; what matters is the signal to noise ratio (SNR) 
of /3^. In what follows, we will use Po{-) and Pi(-) to refer to the probability laws under T-Lq and T-Li, 
respectively, and Eq and Ei to refer to the corresponding expectation operators. 

The optimal detector, that which minimizes the miss detection probability for a fixed false alarm 
probabilify, is fhe Neyman-Pearson defector [Ml- The corresponding decision rule compares fhe likelihood 
rafio L{Y ) = ^ threshold and chooses Hi only if it exceeds the threshold. The likelihood 

ratio for this problem can be computed as 



S 


n=l 


2 


( 1 ) 
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where P{s) = tTs^Psi,s 2 '' 'Psn-i,sn probability of the state sequence s under the Markov chain P. 
Conditioned on the state sequence s, the variable ym,n^ distribution is different under the two hypotheses 
only if m = Sn- The expression in ([Hi might appear complicated, as the sum is over an exponentially 
large (M^) number of possible state sequences. However, the likelihood ratio turns out to be easy to 
compute: it was shown in lO that L(Y^) can be reformulated in terms of matrix product^: 

L{Y^) = 7r^DiPD2P...PDNl, (2) 


where P is the transition matrix of the Markov chain, and Dn is a diagonal matrix defined as 

Dn ^‘^exp diag (^exp(/3yi,„),..., exp(/3yM,n)) 

for 1 < n < A^. Thus, the likelihood ratio can be computed in 0{M^N) time. 

A far more difficult problem is to characterize the performance of the detector, i.e., to compute the 
type-I (false alarm) error probability Tfaise_aiarm and the type-II (miss) error probability Pmiss- Under the 
optimal detector, these are given by the expressions 


Tt'alse_ 


alarm — 


Po{Y^)d^^y 




p ■ — 

miss — 


Pi{Y^)d^^y 




where r is the Neyman-Pearson threshold chosen to achieve the constraint on Hfaise_aiaim> and the integrals 
are over all MN variables {ym,n}- These are very high dimensional integrals for which only Monte Carlo 
techniques would be practical. However, we would like to say something about the performance of these 
systems without having to simulate them. In particular, we expect that the performance depends on two 
parameters: the element-wise SNR /3^, and some measure of the complexity of the Markov chain P. For 
example, more restrictive dynamics for the state sequence s should make it easier to correctly distinguish 
between the two hypotheses. 

We consider the asymptotic performance of a detector as N ^ oo, i.e., as the observation time increases 
without bound. Let e € (0,1) be a constant. Given a sequence of optimal detectors d^iY^) with false 
alarm constraint Pfaise_aiarm < e (where 6n has access to N observations of the network), the (type-II) 
error exponent is 


y=- lim — logPmiss((^Ar). 
N^oo jS! 


( 3 ) 


This means that PmwAdN) = exp(—-|- o{N)), so that the dominant feature of the miss probability 
is that it decays exponentially with a rate of y. In the remainder of this paper, we will first prove that 
the error exponent in (|H is indeed a well-defined quanfify, and fhen explore techniques fo analytically 
characferize if. 


HI. The error exponent 

A. Existence 

The firsf question is whefher fhe error exponenf rj is a well-defined quanfify. If T-Lq and "Hi were 
bofh i.i.d. hypofheses wifh single-letter marginal densities po(-) and pi(-), then the Chemoff-Stein lemma 
ll28l would tell us that y = D{pq\\pi) = —Eq log the Kullback-Leibler divergence of pi from pQ. 
However, since ELi for our problem is not an i.i.d. hypothesis, the lemma in its original form is not 
applicable. So we prove the following generalization. 

'Readers with a background in statistical physics may recognize this formula as an immediate consequence of the “transfer 
matrix” method 1321 as applied to a one-dimensional generalized Potts model with a quenched random field. 
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Lemma 1 (Generalized Chernoff-Stein Lemma): Suppose we have a sequence of hypotheses T-Lq and 
LLi with a well-defined Kullhack-Leihler divergence rate 


def 
K. = 


jim ^Eolog 

N—>-oo N 


Pi{Y^) 

Po{Y^) 


lim ^EologL(l"^). 
N^oo iV 


Suppose furthermore that under "Hq, the normalized log likelihood ratio £n = L(Y^) converges 

in prohahility to the limit of its expectation, —k. Then the error exponent rj is well defined and rj = k. 
Proof: See Appendix lAl ■ 

To apply Lemma [U fo our problem, we need fo verify fhaf ifs assumpfions hold. This is esfahlished hy 
fhe following proposifion, which uses resulfs from fhe fheory of mafrix-valued sfochaslic processes 11211 : 


Proposition 1: The Kullhack-Leihler divergence rale for our problem, 

K = - lim ^Eolog • • •-P-DatI) , 

N—^oo I\ 

exisls. Furfher, under T-Lq, fhe normalized log likelihood ratio converges almosl surely: 

lim ^log (7r^i:)iPi:)2P-•-P-Divl)(4) 

N-i-oo jy ^ ' 

and Ihus if converges in probabilily. 

Proof: We firsl note lhal, since P is a sfochaslic malrix, we have PI = 1, so we can add an exlra 
factor of P info fhe expression Q fo oblain 

L{Y^) = tt^DiP PDnPI. (5) 


Under T-Lq, fhe factors {DnP}^yi form an i.i.d. sequence of random mafrices, wifh randomness in¬ 
duced by fhe Gaussian variables in fhe definilion of In a classical paper 11211 . Furslenberg and 
Keslen showed lhal for an i.i.d. sequence of random mafrices Xn, if E log'''||X„||oo is finileH, fhe limit 
limTv^oo log 11 Ail • • • Aijviloo exists and the random quantity jj log||Aii • • • Xjv||oo converges almost 
surely to the same limit. This quantity is equivalent to what is known as the (top) Lyapunov exponent — 
the exponential rate of growth or decay of a product of random matrices. First, let us show that the result 
applies to the factors {P„P}. For any fixed n, we have: 


Elog+||P)„P||oo < E 
= E 
= E 


log \\DnP\ 


( 1 

logm^x I exjp{/3ym,n ~ 


R 

m 2 


< oo. 


So fhe condilion we need to apply fhe Fursfenberg-Keslen resulf holds. Now we musl relate fhe likelihood 
ratio fo fhe norm of fhe producl of random matrices. Using Holder’s inequality, we have 


TT^PlP PPtvPI 

< ||7r||i||PiP• • • PPtvPIIIoo 

= ||PiP • • • PPtvPIIoo, (6) 


^Here, log+fa;) = max{0, log(x)}, and the matrix oo-norm is induced by the norm and is given by i|W||oo 
msxiY.fXif. 
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where ® follows from the definition of the matrix oo-norm and the fact that all of the matrices involved 
are nonnegative and all of the vectors are positive. Meanwhile, as a lower hound, we let TTmin = niium 
and it holds that 

PDnPI 

> T^min\\DiP ■ ■ ■ PDj^Pl\\i 

> '^minWDiP ■ ■ ■ PDnP\\oo, (7) 

where (jT]) again follows from the definition of the matrix oo-norm. So we can sandwich the log likelihood 
ratio to within a vanishing constant: 

^log\\DiP ■ ■ ■ DnP\\oo + ^logvimin < ^logL(l"^) < ^ log\\D iP ■ ■ ■ D N P\\oo ■ 

The outer expressions converge almost surely and in expectation due to Furstenherg and Kesten’s results 
ll2n Theorems 1 and 2], so the log likelihood ratio must converge in the same way. ■ 

Remark: Note that the proof only requires that the prohahility distribution of the initial state si he 
positive at all nodes—there is no need to start from the stationary distribution tt. In fact, since P is 
irreducible and aperiodic, we could relax the positivity constraint on the initial distribution and start with 
any known distribution. 

Lemma [T] and Proposition [T] indicate that computing the error exponent boils down to computing the 
top Lyapunov exponent of products of random matrices, a problem known to be hard 1231. For M x M 
matrices, it generally requires solving an integral equation to obtain the invariant measure of a continuous 
diffusion process on a M-dimensional real projective space 1331. In low dimensions (e.g., M = 2 or 3), 
this can be done with numerical quadrature (see, e.g., @, l34l ). but this becomes intractable for high 
dimensional problems. Thanks to almost sure convergence of the normalized partial products in ([H, one 
can use Monte Carlo simulations to estimate the error exponents. A simple Monte Carlo procedure that 
does just that is presented in Section IV-DI where we report some results of numerical simulations. 


B. Upper and Lower Bounds 

Obtaining analytical expressions for the error exponents for general Markov chain structures is expected 
to be a very challenging task. Instead, we will focus on deriving bounds for the error exponents. The 
Lyapunov exponent formulation of the error exponent as given in Q does not lend itself to easy analysis. 
To proceed, we use the alternative form of the likelihood ratio in ([Hi to rewrite the error exponent as 
follows 


ri = lim — —Elog 
N—>-oo N 


P{s) exp ( Pvs - 



1 

=-lim —Elog 

2 N^oo N 


E 

S 


P(s) exp(/3?/s)^, 


( 8 ) 




where s = {si,S2, ..., sn) G {1,..., M}^ is a state sequence of the Markov chain, and we define 


N 

del V'' 

Vs — / ^ ysn,ri 
n=l 




(9) 


to be the sum of the Gaussian random variables associated with a given state sequence s. Here, and in 
what follows, we shall simply use E to refer to the expectation under T-Lq, since we have no further use 
for El. To study the behavior of the error exponent, we just need to study 

1 


ip{P) = lim ^Elog 
N^oo JM 


^P(s)exp(/?ys)). 


( 10 ) 
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We will derive upper and lower bounds on this quantity, which will translate into bounds on the error 
exponent rj. There is a simple lower bound: by treating the sum P(s) exp(/3ys) as an expectation 
and applying Jensen’s inequality, we get 

ip{l3) > lim P{s) log ex.p{/3ys) = 0. 

N^oo ^' 


This then gives us an upper bound for the error exponent 



which can also be interpreted as the “genie” bound: if we are given the true state sequence s, then we 

can examine only the variables along that path and ignore all others, leading to an i.i.d. hypothesis testing 

02 

problem with error exponent It provides an upper bound on the true error exponent since the extra 
side information about the correct path can only improve the performance. 

To get a lower bound on rj, we can still apply Jensen’s inequality, but this time to the outer expectation 
E in (fTOl) . to obtain 

^p{p) < ^Im^ ^ log ( expiPVs)) = y, 


which gives us rj > 0. Of course, this is trivial since rj is equal to a limit of Kullback-Leibler divergences, 
which are always nonnegative. Another lower bound can be obtained by considering the test statistics 
Un = Vmn, the sums of the states in each time step. Since we are discarding information, the error 
exponent for this problem can be no greater than that for the original problem. But the new problem is 
just testing two i.i.d. hypotheses yn AA(0,M) and yn As we know, in the i.i.d. case 

the error exponent is simply the Kullback-Leibler divergence of these two densities, giving us a lower 
bound of 

- 2M ■ 

This is a nontrivial bound, but just barely. For large M, the error exponent is very small indeed. In 
fact, we would need M times the observation length to obtain the same performance as the genie-aided 
detector. 

We will spend the remainder of this section and all of the next two sections computing a nontrivial 
lower bound for rj, one that we will find empirically to fully capture its behavior. Qualitatively, this lower 
bound will guarantee that, above a certain threshold SNR, the error exponent will be bounded by 

r/>^-0(/3), 


meaning to leading order, the maximum likelihood detector will be just as good as the genie-aided 
detector. 

To develop this bound, we will borrow ideas from the theory of spin glasses HB-lMl, ll29l . a class 
of disordered systems studied in statistical physics. In fact, we have already chosen our notation so that 
our result closely resembles the quantities studied in that field. In particular, the function resembles 
the so-called “free energy density” of a spin glass, defined as 


m 


-lim —E log 

/3 N^oo N ^ 


'^exp{-PH{s 


( 11 ) 


where N is the number of particles in the spin glass, s € is an indexing vector representing 
the configurations of the system (there are typically exponentially large number of them), /3 is the 
inverse temperature parameter, and H{-) is a random Hamiltonian, a function defining the energy of each 
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configuration. For our problem, we can write the function in (fTOb as = —/3cj){l3) if we choose the 
Hamiltonian to be 

His) = -ys-^logP{s). ( 12 ) 

Despite the extra factor of —(3, to be concise we will abuse the terminology and henceforth refer to ip{/3) 
for our problem as the free energy density. 

Computing the free energy density of a disordered system is often very difficult. In fact, there are 
seemingly simple models that have been studied for many years with no exact solution Il29l . |[35l . 
The main challenge lies in the fact that the free energy density 0(/3) in (fTTl) involves the sum of an 
exponentially large number of random variables. The high-dimensional correlation structures of the 
random Hamiltonians {H{s)}g can often lead to remarkable phenomena (see, e.g., |[T3l . |[29l . Uhl). 

In our problem, the correlations of the Hamiltonians can be computed as follows. Let denote 

two arbitrary paths of the Markov chain, and let H{s^), H{s^) be the associated Hamiltonians as defined 
in (fT^ . Using @, we can easily verify that 

N 

co\{H{s^),H{s^)) = 4), (13) 

n=l 

where !(•) is the indicator function. This means that the Hamiltonians of the various states in our problem 
are indeed correlated, and the covariance is equal to the number of times the two sequences overlap. 

In the spin glass literature, removing or just reducing the correlations between state Hamiltonians can 
often sii^lify a problem |[29l . |[37l . We follow this idea: if we drop the correlations, we obtain a modified 
functioqj 

(^(/3) = lim ^Elog ( y'P(s)exp(/3xs)), (14) 

N-^oo jy \ / 


where Xg AA(0, N\ i.e. they are an uncorrelated Gaussian ensemble with the same variance as the 
Us- We note that the two functions in (fT4l) and (fTOl) have exactly the same form, the only difference 
being the absence of correlation in {xg}. Dropping the correlation, as we shall see, makes our problem 
tractabl^ Interestingly, it also provides a lower bound on the error exponent, which is precisely what 
we seek for our problem. The argument relies on the following lemma: 

Lemma 2 (Slepian’s Lemma /l29l pp. 12-15]): Let the function F : —)• R (for some L) satisfy the 

moderate growth condition 

lim F(r;) exp(—a||r;||^) = 0 for all a > 0, 

llvll^OO 


and have nonnegative mixed derivatives: 


92 F 

dvidvj 


> 0 for t / j. 


Suppose that we have two independent zero-mean Gaussian random vectors x and y taking values in 
R^ such that Ex? = Ey? and Ej/jj/j > ExjXj for i / j. Then EF(y) > EF(x). 

Applying this to (^(/3) gives us the desired lower bound on the error exponent: 

o2 

Proposition 2: The error exponent satisfies p > ^ - m- 


^Strictly speaking, we need to show that p{p) exists, i.e. that the limit is actually well-defined. We will do this in Section Ivl 
by actually computing it. Until then, we presuppose its existence in all our arguments. 

^In spin glass parlance, our function ip{j3) may be regarded as the (rescaled) free energy density of a new generalization of 
the random energy model (REM) (53. 
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Proof: Define F{v) = — log(^g P(s) exp(/3tis)). This is a function from to M that clearly 

satisfies the moderate growth condition. We can compute the cross second derivative with respect to Ugi 
and Us 2 , with / s^, as: 

dF _ /3^P(s^)P(s2) exp(/3(r;si +Vs^)) 
dvs^dvs^ Es^(^)exp(/3us)]2 

which is clearly nonnegative. From (fT3l) . we know that for / s^, Eysiys 2 > 0, and we have constructed 
the X ensemble so that ExsiXs 2 = 0. Thus, applying Slepian’s Lemma gives us EF{y) > EF{x), which 
is equivalent to ip{/3) < (p{/3). The statement of the proposition then follows immediately from ([8]l. ■ 

Next, we will show how to explicitly compute ^(/3) hy using tools from large deviations theory. Before 
delving into the technical results, we first present in Section |IV] a high-level and non-rigorous overview of 
the main ideas used in our approach. The discussions there also provide a roadmap to the various rigorous 
arguments that lead to our final results, stated as Theorem [3] and Propositions and |7] in Section |Vl 


IV. Main Ideas and Roadmap to the Technical Results 
To begin, we can rewrite the free energy density as: 

viP) = lim ^lElog V exp(/3a:s-f logP(s)), 


(15) 


where we are considering only the set C {1,..., of paths that have nonzero probability under 
the Markov chain P (the other paths contributed nothing to the sum in the first place.) 

We can group the terms of the sum by their ^ logP(s) and values, dividing them into bins with 
a small width 6. Counting the number of configurations (i.e., paths) in each bin as 


def 


C%{p,0 = #{« e ■■ logP(s) G [Np,N{p + 5)] and x. G + 5)]}, 

then we should be able to approximate the sum as 


<y9(/3) « lim ^Elog J]] J]]C'^(/3,Oexp(A^[/3C-f p]), 
Af->oo iV ^^' 


(16) 


where the sums are over comerpoints of the bins. In Section |Vl we will show that a form of this 
approximation can be made exact. 

Of course, •) is random due to its dependence on the Gaussian variables {xg}, but it turns out that 
there will be a concentration of measure phenomenon that will allow us to treat it deterministically in the 
large N limit. If we consider only the marginal count Cff{p) of paths satisfying log P{s) G [Np, iV(p-|-5)], 
then there is no randomness involved; we can show that this count grows exponentially: 


Cffip) = exp 


sup s(p')] + o 

p'e[p,p+S] 


where s{p) is the “microcanonical entropy density” function for ^logP(s). This is physics jargon for 
the exponential growth rate of the number of configurations within an energy level |[T3l . In Section |Vl we 
will show how to compute it (see Proposition H]) and derive several important properties (see Proposition 
[S]). A notional illustration based on those properties is provided in Figure |2] 

Meanwhile, the full count C^(-, •) will also grow exponentially: 


CNip,0 = exp hv 


sup s{p,^') 

P'&[P:P+S\ 

C6[C,«-t<5] 


+ o{N) 
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s{p,0 



Fig. 2. Notional illustrations of the microcanonical entropy densities s(p) (left) and s{p, (right). s{p) is the exponential 
growth rate of the number of paths s satisfying log P(s) « p, whereas s(p,^) is, with probability 1, the exponential growth 
rate of the number of paths satisfying i log P(s) « p and -^Xg ~ The density function s(p,^) has a compact support, 
outside of which the density s(p,5) = —oo, meaning that there is no path there. Analytical expressions for these functions are 
derived in Section El 


with probability 1 under the disttibution of the Xg, where s{p,^) is the two-dimensional microcanonical 
entropy density function for the pair log P(s), In Section IVl we will show how to compute 

s{p,^) (see Theorem O, which is of course closely related to s{p). Again a notional illustration is 
provided in Figure |2l 

As N grows, the number of states grows exponentially, and we can let the bin width 6 vanish and 
approximate the sum (fT^ by an integral. The free energy density can then be evaluated as 

^(/ 3 ) « lim ^Elog [ [ exp {N[s{p,^) + + p\) dpd^ 

N^oo iV J J 

= sup |s(p,0+ /3C + p|, (17) 

where the equality is obtained via the Laplace principl^ ll^ ; we will use a rigorous formulation of this 
principle in Theorem [3 in the next section. 

To actually compute we will need to evaluate the supremum in (fTTl ). As it turns out, the 

microcanonical entropy density s{p,^) has a compact support (see Figures |2] and [3]), outside of which 
the density s{p,^) = —oo. The supremum can thus be only achieved at the interior or the boundary 
of the support region. As illustrated in Figure |3 the location where the supremum is achieved depends 
on whether (3 is greater or less than a threshold of y/2H, where H is the entropy rate of the Markov 
chain P (defined in Section |Vl) As shown in the figure, below fhe threshold, the supremum is achieved 
at a critical point in the interior of the support region; as /3 increases the critical point moves up along 
the line p = H until it hits the boundary. As /3 continues to increase beyond the threshold, the location 
of the supremum moves along the boundary in a direction of decreasing p. The change in behavior at 
the threshold corresponds to a phase transition in ^(/3). In Section IV-CI we will provide a closed-form 
expression for p{f3) below the threshold, and a parametric representation for it above the threshold. The 

^The Laplace principle states that when N is very large, J exp{Nf{x))dx = expfA^sup^ f{x) + o{N)), i.e. the integral is 
dominated by the peak. This is also known as the saddle-point technique, a powerful tool in asymptotic integration 1381 . 
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Fig. 3. The location of the supremum that ultimately gives us ip{j3) is illustrated here. The entropy density s{p,^) is finite 
only in the compact region A illustrated here—this is also the effective domain of the large deviation rate function 
which will be defined in l l23l l. Below the threshold, the supremum in G?} is achieved at a critical point in the interior; above 
the threshold, the supremum moves along the boundary as /3 increases. The change in behavior at the threshold leads to a phase 
transition. Technical details will be provided in Section lYl 


reader who wishes to skip the technical details can skip directly to that section, where we provide these 
expressions. 


V. Rigorous Derivation 

In this section, we use results from large deviations theory to rigorously derive expressions for the 
lower hound. 

A. Large deviations and the microcanonical entropy density 

First, we introduce the large deviations property for a sequence of prohahility measures: 

Definition 1 (Large Deviation Property / l27l pp. 35-36]): Let A” he a complete separable metric space 
and B{X) he the Borel fj-field of X, Then the sequence {Q 7 v}?^=i of prohahility measures on B{X) 
satisfies the large deviations property if there is a lower semicontinuous function I : A" —[0, oo] (the 
function may take the value oo) with compact level sets such that 

1) limsup — logQN^B) < — inf L(x) for every closed set B in B, and 

N^oo N 

2) liminf log Qn(U) > — inf I(x) for every open set U in B. 

N^oo N xeU 

I{x) is known as the rate function. 

To apply large deviations theory, we will consider the ordered pairs (^logP(s), ^Xs) for s G 
as inducing an empirical measure Qat; for any set B C M^, 



One way to think about this is as follows: if we choose an allowable state s G uniformly at random 
(rather than choosing it by running the Markov chain), then Qn{B) is the probability that the ordered 
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pair (^P(s), ■^Xs) is in B. This is just the numher of states in the set B divided hy the total numher 
of allowable paths . 

Since the {xg} are random, Qw itself is a random prohahility measure. It is important to note that 
there are two levels of randomness here: first, the random variables {;^Xs} themselves, and second, the 
empirical probability distribution that they induce when paired with the log probabilities ^ logP(s). 
We will show that with probability 1, the empirical probability measure will satisfy the large deviations 
property in Definition [T] and we will compute the rate function 

We will need to compute the number of allowable paths. If every entry of transition matrix P 

is nonzero, then this is simple: . If each row of P has exactly K nonzero entries, meaning 

that each state can transition to only K other states, then . However, in the general case, 

we have: 

= EE-E HPsi,S2 + 0)l(pg„g3 7^ 0) • • • + 0) 

Si S2 Sat 

= lT(p(0))^-\^ 


where for any matrix A and f G M, we define fo be fhe sparsify-preserving Hadamard power of A, 
whose fyfh enfry is given by: 

[a(*)i- • = / ^ ^ 

\o if[A]i,, =0. 

In particular, is a 0-1 mafrix fhaf is fhe adjacency mafrix of fhe directed graph underlying fhe 
Markov chain. Ifs zjfh elemenf is 1 if and only if fhere is a nonzero probabilify of fransifioning fo slate j 
direclly from slate i. Since P is irreducible and aperiodic, so musl be P^^\ Due fo fhe Perron-Frobenius 
Iheorem, Aniax(-P^*^^) is simple, fhe associated lefl and righl eigenvectors can be chosen to be positive, 
and all olher eigenvalues are of smaller magnilude, so we can see lhal grows exponenfially wilh 

rale 

hm 1 log = log 
N^oo iV 


The firsl sfep toward showing lhal Qn satisfies Ihe large deviation properly wilh probabilily 1 is to 
show lhal ils marginal Q]^ wilh respecl to Ihe firsl argumenl satisfies Ihe large deviation properly. This 
is simply Ihe empirical probabilily measure on M induced by ^logP(s) for all s G . If is not a 
random measure, since il does nol depend on Ihe Gaussian random variables {xg}. We will exploil Ihe 
powerful Garlner-Ellis Iheorem: 

Theorem 1 (Gdrtner-Ellis Theorem I^T7\ p. 47]): Suppose we have a sequence of random variables Xn 
faking values in M. Lei ^ logE exp(lXAr) be finite for every t, N. Suppose Ihe limiting cumulanl gener¬ 
ating function (CGF), given by c{t) limAr^oo logE exp(iAjv), exisls and is finite and differentiable 
for all t. Then Ihe Legendre-Fenchel Iransform of c{t), given by 

I{x) = sup <tx — c{t) >, 
tern i 


is convex, lower semiconlinuous, nonnegative, has compacl level sels, satisfies inf^/(x) = 0, and is Ihe 
large deviations rate function for -^X^. 
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In our case, the random variable Xj\f is the one induced by choosing a state s uniformly at random 
from , and taking = log P{s). We can compute the limiting CGF as: 


c(t) = lim — log 
N—>-oo N 


= lim — log 

A'"—>00 N 


( , \ Af—1 ^ y ' ' ' ^ y '^SiPsi,S2^iPsi,S2 7^ O) ' ' ' P S N -1 ,S (P S N-1 ,S N 7^ Q) j 

\lT p{0)\ 1 Si / 


TV 


it) 


p{i) 


N-1 


— lim — log 

Af—^oo N 


_p(0) 


AT-l 


= logA„ia.(P^*))-logAniax(-P(°)), 


(18) 


again using the Perron-Frobenius theorem, which due to the irreducibility and aperiodicity of P ensures 
that only the top eigenvalue remains for both terms. To apply the Gartner-Ellis theorem, we need to show 
that c{t) is differentiable. This follows from the following proposition, which provides several properties 
of the function log Aniax(^’*^*^) that we will need. To simplify the notation, we will define 


Proposition 3: The function logAi satisfies fhe following properties: 

(1) logAi is finife, analytic, and convex on M. 

(2) log Xt is in facf strictly convex on M unless P is fhe fransifion mafrix for a uniform random walk 
on a regular graph, i.e. fhere is some infeger K < M such fhaf each row of P has exacfly K nonzero 
enfries, all of which are In fhaf case, logAi = (1 ~ f) logiF. 

(3) Lef at and bt be fhe leff and righf Perron-Frobenius eigenvecfors of respecfively. Then fhe 
derivative is given by: 

d , , an(logP) opW]6t 


dt 


log Ai = 


at 


[P^^^]bt 


(19) 


where fhe log operates only on fhe nonzero enfries of P, and o is fhe Hadamard (enfrywise) producf. 
(4) The range of ^ log Xt is given by 


inf — log Ai = lim inf min — log P{s) = p^in 
t at N^oo N 

sup 4; log Ai = lim sup max ^ log P(s) p^ax 
i dt iv^oo S6P« N 

Proof: See Appendix iBl 
Now we can prove fhe following proposition: 

Proposition 4: Qjy has a large deviafions properfy wifh rate funcfion 


( 20 ) 

( 21 ) 


/i(p) = sup {tp - log Ai + log Ao} 

t 

= log Ao - s(p), 

where s(p) infi | log Xt — fp|. 

Proof: Since logAi is analytic, fhe limiting CGF c{t) as defined in (IT^ is differenfiable, and fhe 
proposition follows from fhe Garfner-Ellis fheorem. ■ 

To complete fhe large deviafions analysis, we will need fo use several properties of s(p). One quanfify 
fhaf will be imporfanf is fhe entropy rate of P: 

Definition 2: The enfropy rate of an irreducible and aperiodic Markov chain P is given by 

yH' = - ^ TTi ^ Pip log Pi J, 

* i 
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Fig. 4. The basic properties of the functions log At and s{p) are illustrated here, log At is a convex function (strictly convex 
except for a degenerate case); its value at f = 1 is 0, and it has limiting slopes pmin and pmax. s{p) is nonnegative and concave, 
takes the value H at p — —H (where the slope is —1), and is finite only on [pmin, Pmax]- Its peak and the location thereof 
is determined by the value and slope, respectively, of log At at f = 0. (For the degenerate case of a uniform random walk on 
a iT-regular graph, the curves look different: log At is just a linear function (1 — t)\ogK, and s(p) is only finite at a single 
point, p = — log K, where s{— log K) = log K.) 


where tt is the unique stationary distrihution. The entropy rate can he understood as the conditional 
entropy of the next state given the current state, averaged over the stationary distrihution. 

This definition will he important in the following proposition: 

Proposition 5: If P is the transition matrix for a uniform random walk on a iT-regular graph, then 
s{p) is given hy 

s{p) = 


log K if p = — log K 
—oo if p / — log K. 


( 22 ) 


Otherwise, s(p) satisfies the following properties: 

(1) s : M — 1 - MlJ{—oo} is a concave function that is nonnegative on its effective domain, [pmin,Pmax], 
where pmin and pmax were defined in (l20l) and (|2TI) . respectively. 

(2) s{p) is continuous in (pmim/0max)^ and continuous from above at P mm and Pmax- 

(3) s[p) is differentiable on (pmiiDPmax)- The function s'{p) is one-to-one and —s'{p) is the inverse 
function of ^ log A*. 

(4) s{—H) = H and s'{—H) = —1. Meanwhile, s(po) = logAp and s'(po) = 0, where po = 

(ao)'^(log P)bo 

Oq bo 

Proof: See Appendix O ■ 

We provide notional illustrations of log A* and s{p) in the general case, based on the properties described 
in Propositions [3] and [51 in Figure jH 

Now we can prove the large deviation property for the two-dimensional empirical measure Qjv induced 
by the pairs (ilogP(s), 

Theorem 2: With probability 1, the empirical measure satisfies fhe large deviation property with 
rate function 




A(p) + T> if hip) + ^ <logXo 
oo, otherwise. 


(23) 


Proof: See Appendix iDl ■ 

Remark: The microcanonical entropy density functions described in Section jlV] and the large deviation 
rate functions computed in this section are closely related. Entropy density functions give the exponential 
growth rate for the number of states within some window; large deviation rate functions give the 
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exponential decay rate for the probability of a uniformly chosen state in some window. Since the number 
of states in a window is equal to times the probability under the empirical measure, we have that 

the microcanonical entropy density functions as illustrated in Figure |2] are given by: 

s{p) = logAo -Hp) 
and 


s{p,0 = log-^o - 

if ICI < \/‘^s{p) 
1 — 00 , otherwise. 


(24) 


B. The saddle point technique through Varadhan’s lemma 

We can now compute the free energy density (p{/3) given in (fTSl) . We rewrite it in terms of the empirical 
measure as: 


If {(3) = lim ^log(#P^) + lim log [[ exp{N[(3( + p])Qjs[{dp,d(). (25) 

N^oo N^oo 1\ J J 


We have simply re-written the sum over all states as an integral over the discrete empirical measure 
induced by the states. The first term is, as we know, logAo. The second term can be computed using 
Varadhan’s lemma ll27l . a rigorous formulation of the Laplace principle (or the saddle point technique) 
applied to measures satisfying a large deviations property: 

Lemma 3 (Varadhan’s Lemma 1(271 p. 51]): Suppose a sequence of probability measures 

on Tf satisfies a large deviafions properly wilh rale function I(x). Lei F : T” —)• M be a continuous 
function lhal satisfies Ihe fail condition 


Then 


1 f 

lim lim sup — log / 
L^oo N^oo 51 Jx: 


F{x)>L 


ex.p{NF{x))QN{dx) = —oo. 


lim ^ log [ ex.p{NF(x))Q]\f{dx) = sup If{x) — I{x)\. 
N^oo N Jx x&X f J 


We now have all Ihe machinery in place lo prove Ihe main resull: 

Theorem 3: The free energy density is given by 

^(^) = sup|s(p,0+ /3^ + p|, (26) 

where s{p, ^) is Ihe microcanonical enlropy density given in (1241) . 

Proof: To apply Varadhan’s lemma, we need lo show Ihe fail condition 

lim lim sup ^ log [[ ex.p{N[(3( + p])QN{dp,d^) =-oo. 

L^oo N^oo JJ 

{p,0'-K+p>F 


Bui Ihis is simple. For all large enough L, Ihe region R = {(p,^) ■. p > L} has no inlerseclion wilh 
Ihe support of L(p,^), and Ihus il satisfies Qn(R) = 0 wilh probability 1. Thus Ihe fail condition holds, 
and Varadhan’s lemma gives us lhal, almosl surely. 


N exp{(3xs + log P(s)) = log Ao + sup |/3C + p - / (p, 0 } 

= sup|s(p,0 +/3^ + p}- 




( 27 ) 
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In general, almost sure convergence does not guarantee the convergence of the expectation. However, 
if a sequence of random variables is uniformly integrable, then almost sure convergence (indeed, merely 
convergence in prohahility) guarantees convergence in L^, which is stronger than convergence of the 
expectation. Uniform integrahility is a sort of joint tail condition for a sequence of random variables. 
As it turns out, the sequence of random variables log ^ P(s) exp(/3a;s)^ is uniformly integrable. 
Rather than belabor the point here, we will prove this fact (after formally defining uniform integrahility) 
in Appendix |El This then immediately gives us the statement of the theorem. ■ 


C. Evaluating the bound 

Now we are in a position to actually compute ^(/3), which will then give us a bound on the error 
exponent rj. We start with the degenerate case, which has a closed form expression: 

Proposition 6: If P is the transition matrix for a uniform random walk on a iT-regular graph, then 
the error exponent satisfies 


.jo, if /? < ^/21^ 

77 > < 02 ,_ (28) 

j ^— f3y/2 log K + log iT, otherwise. 

Proof: Combining (l22l) . (l24l) and (1^ . we have ^(/3) = sup|g|<^ 2 iogiv “ ^}- The supremum 

can be solved exactly; using the bound rj > ^— ^(/3) gives us (|2^ . ■ 

The general case is slightly more complicated. We have the following parametric representation (of 
which the degenerate case expression given in Proposition is a special case): 

Proposition 7: For any irreducible and aperiodic Markov chain P, the error exponent bound is 


V > 



if /3 < 

if /3 > Vm, 


where x(/5) is a function that can be parametrized for t G (0,1] as: 


A 

xm = 


^y/logXt - tpt, 


1 - 2t 


log Ai 


l-t 


t 


Pt, 


(29) 


and Pt = -^ log Ai is given in (fT^ . 

Proof: Since the function s{p) — ^ 13^ + p is concave and continuous on the effective domain of 

s(-,-), given by ,4. = {(p,0 ■ 1^1 ^ -v/2s(p)}, the supremum is achieved at a point where s'{p) = —1 
and ^ = /3, if one exists in the interior of A', if not, then the supremum is achieved on the boundary 
of A. See Figure [3] for an illustration. From Proposition |5l we know that s'{—H) = —1 (the only such 
point), and s{—H) = H. So we get p(/3) = H — ^ + — H = ^ so long as (3 < s/2H. 

Otherwise, the supremum is achieved on the boundary, so = ^^2s{p) and 


(p(/3) = sup I3sj2s{p) + p. 

[Pmin 5pmax] 


Since the funetion to be maximized is differentiable, the supremum occurs at the value of p for whieh 

2ikL+i=o, 

\/2s(p) 

if one exists; otherwise the supremum occurs at one of the endpoints pmin or pmax- We will show that 
such a point always exists. To see this, choose any t G (0,1]. Based on the results in Propositions |3] and 
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m we know that for Pt = -^ log Xt, we have s'{pt) = —t and s{pt) 
a value of /3: 


= log Xt — tpt- This in turn gives us 


^J2s{pt) 
P^- - TTi 


s'{pt) 


and a corresponding value 


~(R \ _ 2s(pf) 


Using these representations, we can compute j5i —since we know that ^ log At = —H, we have that 

“ t=i 

/3i = y/2H. Meanwhile, limf^o+ A = oo. This is because the numerator ^2s(po) = \/2 log Aq > 0 
hy the Perron-Frohenius theorem, so s{p) is strictly positive in a neighborhood of t = 0, while the 
denominator s'{pt) approaches 0 from below. From the intermediate value theorem, we can then achieve 
any value of /3 in [y/2H^ oo) by choosing some t S (0,1]. Thus we have a fully parametric representation, 
and substituting the known values of s{pt) and s’{pt) and applying the bound p > ^ — ^(/ 3 ) gives us 
the result. ■ 

The bound given in Proposition |7] is equal to 0 when the SNR is below a threshold: < 2H. However, 

it is strictly positive for SNR above the threshold. Thus, we can guarantee strong performance when the 
SNR is greater than twice the entropy rate of the Markov chain. The entropy rate is smaller when the 
Markov structure is more restrictive; thus, the stronger our information about the dynamics of the process, 
the stronger the performance of the detection. Furthermore, at very high SNR (3 S> 2H, we can use the 
parametric representation (l29l ) to show that ^— 0{f3) < p < ^, meaning the upper bound derived in 
Section ITTT-RI becomes tight. This is to be expected; at very high SNR, the knowledge of the true state 
path is not necessary to improve performance. 


D. Numerical Verification 

From Lemma [H which equates the error exponent to the Kullback-Leibler divergence rate, and Propo¬ 
sition [TJ which says the normalized log likelihood ratio converges almost surely to —k = —p, and the 
fact that the log likelihood ratio can be computed efficiently, we have a simple Monte Carlo technique for 
estimating the true p. The only caveat is to prevent numerical underflow through a suitable renormalization 
procedure. 

We used this Monte Carlo technique to estimate the error exponents over a range of SNRs for several 
Markov chains. In Figure |5] we compare the Monte Carlo simulations to the lower bound obtained using 
the parametric representation (l29l l. 

Although the phase transition appears only in the lower bound, the true error exponent curves appear to 
exhibit a smoothed version of the phase transition. Below the threshold the error exponent is quite small. 
It is bounded by the sum detector’s error exponent of as we showed in Section ITTT-RI Of course, 
the sum detector completely ignores the structure of the problem, and when M is large, this bound is 
practically 0. Meanwhile, above the threshold the error exponent grows quickly with increasing SNR. 
Thus the simple test ^ 2H suffices to determine whether one should expect good or bad detection 
performance. 
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Fig. 5. Error exponent curves are plotted for random walks on four graphs, from top to bottom: a cycle graph with 101 
vertices {H = 0.693 nats), a 32 x 32 grid (H = 1.58 nats), a random geometric graph with 1000 vertices (H = 2.09 nats), 
and a Watts-Strogatz small world graph 1391 (H = 3.41 nats). The solid curve is the error exponent computed via Monte Carlo 
simulations. The green dashed curve is the sum-detector lower bound, which is barely nontrivial because M is large. The 
blue dashed curve is our statistical physics-based analytic lower bound, computed using the parametric representation l l29t . The 
analytic threshold (SNR = 2H) is shown as well. At the same SNR level, the higher the entropy rate of the Markov chain, the 
worse the detector performance. 
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VI. Conclusions 

In this paper, we studied the prohlem of detecting a random walk on a graph from spatiotemporal 
measurements corrupted hy Gaussian noise. We modeled the prohlem as a combinatorial hypothesis 
testing prohlem and studied the type-II error exponent of the optimal Neyman-Pearson detectors. We 
proved the existence of the error exponent and the fact that it is equal to the limiting Kullhack-Leihler 
divergence rate between the two hypotheses. We showed how concepts from statistical physics could 
be used to analyze this quantity, and rigorously proved a bound for the error exponent. Monte Carlo 
simulations show that, unlike the sum detector bound, our bound fully captures the behavior of the error 
exponent. In particular, the bound provides us with a simple test for whether to expect strong or weak 
performance: if the SNR is greater than twice the entropy rate of the random walk, then detection will 
be easy. 


Appendix 

A. Proof of Lemma [7] 

The proof is a rather straightforward generalization of the proof of the standard Chernoff-Stein lemma 
given in ll2^ . Consider the sequence of optimal detectors 5^, is., the Neyman-Pearson detector that 
choose PLi if £j\f > Tj\f and T-Lq otherwise, where tn is a sequence of thresholds chosen to satisfy the 
false alarm constraint Tfaise_aiaim < e for some fixed e G (0,1). The false alarm and miss detection 
probabilities are then given by 

-^false_alarm Pf){fN P Tn) 

and 


PZss = <Tn), 


respectively. Note that we already have that lim inf tv^oo uat > —if that were not the case, then since 
£n ^ —K in probability under PLo, we would have limsupjv^.oo aiaim = f’ which would violate the 
false alarm constraint. 

Noting that Pi{Y^) = ex.p{N£N)Po{Y^), we can rewrite the miss detection probability as 

Pmks = iEl l(^Af < Tat) 

= Eo l(^Ar < tn) exp(iV^Ar), (30) 

where !(•) is the indicator function, since multiplying by exp(iV£ 7 v) converts the density Po{-) to the 
density T’i(-). Choosing an arbitrary 5 > 0, we have 

Po{£n g [-K - 5, Tat]) 

= l-Poi£N <-fi-S)-Poi£N>rN) (31) 

> 1 — Po(^Ar < —— <5) — C; (32) 


since the final term in (|3TI ) is just the false alarm probability, which is constrained. It then holds that 

^ logPn^,, = ^logEo[l(^Ar < Tat) exp(iV£Af)] 


N 


N' 

> ^logEo[l(^Ar G [-K - (i,rAr])exp(iV£Af)] 

> -K - (5 + ^ log Po G [-K - 6, tat]) 


> -K - 6 + — log 


f — e — Po(£n < —fi — <f) 


(33) 
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from which we can conclude that liminf^r^oo ;^log^i^ss — since 6 can he made arhitrarily small 
and the last term on the right-hand side of ([33]) vanishes. 

Now, instead, suppose we simply fix r^r = — k + 5 for every N. Then clearly alarm 0 because 
i]\f —)• —K, in prohahility. Thus, eventually, < e. Meanwhile, the maximum value of the 

quantity inside the integral in (l30l) is exp(A^rAr), so we have that ^logP^^j, < tn = —k + S. So 
limsup^^oo iflog-Piiuss — since again 6 is arbitrary. 

We have shown the following: (1) any sequence of Neyman-Pearson detectors satisfying the false alarm 
constraint < e satisfies 

N^oo 

and (2) fhere exisfs a sequence of Neyman-Pearson detectors satisfying the false alarm constraint Pfaise_aiarm < 
e for which lim sup^v^^o log < —k. Thus for the optimal sequence of detectors, we have 
r] limAT^oo log This holds for any e € (0,1), so the proposition is proved. 


B. Proof of Proposition O [Properties of log \t] 

(1) P^*^ is an irreducible nonnegative matrix for any t, just as P is. Thus the Perron-Frobenius theorem 
tells us that Amax(T’^*^) is a real, positive eigenvalue, so log At is well-defined and finite. Since is an 
analytic function of t for any positive w and the zero function is an analytic function, we have that every 
entry of is analytic in t. Standard perturbation-theoretic results HOll tell us that on any neighborhood 
in which a matrix function is analytic and an eigenvalue remains isolated from the rest of the spectrum 
{i.e. has no multiplicity), it can be analytically continued to the rest of that neighborhood. Since Xt is 
the Perron-Frobenius eigenvalue, it is simple and thus isolated. Therefore, it is an analytic function of t 
everywhere. Since it is positive, log Xt is analytic as well. The convexity of log At follows from a property 
of Fladamard powers proven by Horn and Johnson in BTl p. 361]: for any nonnegative matrices A and 
B and 0 < a < 1 , they showed that 

Amax(A(“) O pT-“)) < A„,ax(A)“A„,ax(S)^"“. (34) 

Taking A = and B = for arbitrary f > s > 0, and using the fact that log is an increasing 
function, we have 

logA„,ax(P(“'*+(^-“)*)) < alogA„,ax(P(*)) + (1 " o) log A„,ax(p(')), (35) 

which by definition gives us the convexity of log At. 


(2) Strict convexity means that the inequality in (1351) must be strict. Since log is strictly increasing, 
equality holds if and only if it holds in (l34l) . which for irreducible matrices holds if and only if there 
exists a positive scalar 7 and a positive diagonal matrix D such that 7 A = D~^BD IHTI p. 361]. For 
our problem, then, equality holds if and only if there are some t > s > 0 such that for all i,j 


s t 

iptj = fp' 


IJ1 


for some positive constants 7 and d*, z = 1,... , M. Thus either pij = 0 or 


Pij = l^-‘dl ^d- 


Summing over j on both sides of this equation tells us that di must be a constant. This means that all 
of the nonzero entries of P must be constant. Since the row sums of P must be one, this means that 
every row of P having exactly K nonzero entries equal to ^ for some K < M is the only situation in 
which strict convexity does not hold. 
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So what exactly is log A* in that case? Consider the test vector 1: we have So the 

test vector is an eigenvector. The Perron-Frohenius theorem states that any positive eigenvector must 
correspond to the largest eigenvalue. Since 1 has all positive entries, we have that \t = K^~^, so 
logAi = (1 - t) logiT. 

(3) As before, we use the perturbation results. In addition to an analytic eigenvalue function, in the case 
of a simple eigenvalue, there are analytic functions for the left- and right-eigenvectors. These can be 
normalized as desired. So we have analytic functions at and bt such that 

afpW = Xtaf 
P^'^bt = Xtbt 

and normalizeci^ such that ajbt = 1 and afl = 1. We can write the largest eigenvalue function as 
Xt = aJP^^'^bf. Using the chain rule, we can compute the derivative 

a; = {a'tfp^^^bt + afp^^^b[ + af ((log P) o p(^^)bt 
= + ajb't] + aJiilogP) o pW)bt 

= an(logP)opW] 6 j, (36) 

where in reaching (I%1) we have used the fact that aJbt = 1 and thus {a't)^bt + ajb't = 0. Using the chain 
rule, we have that ^ log Xt = and the result follows. Note that the normalization of the eigenvectors 
is irrelevant in the final expression because the normalization factors will cancel out in the numerator 
and denominator. 


(4) Since log Xt is convex, its derivative is nondecreasing. Thus 

inf — log Xt = lim — log Xt 
tdt t^-oo dt ^ 


= lim 


log Xt 


t^ — OO t 

where the finals sfep resulfs from L’Hopilal’s rule. By fhe same argumenf, we have 

d . . y logAi 

sup — log Xt = hm —— 

f dt t —>^ + CX3 t 

Horn and Johnson show fhaf fhese are equal fo pmin and Pmax> respectively lldTl p. 367]. 


C. Proof of Proposition |5] [Properties of s{p)] 


As we sfafed in Proposifion [3l if P is fhe fransifion mafrix for a uniform random walk on a regular 
graph, fhen logA^ = (1 — t)logK. If p = — logiT, fhen log Aj — tp = logiT, a consfanf, giving 
us s(—logP) = logK. For any ofher p, fhe function log At — tp is linear buf nol consfanf, so if is 
unbounded and has an infimum of —oo. Now consider fhe general case: 

(1) Consider fhe function —s{p) = sup^ {tp — log At}. This is fhe convex conjugate of log At. A convex 
conjugate function is guaranteed fo be a convex function wifh range M(J{-|-oo}, so s{p) is a concave 
function wifh range M1J{—oo}. Since log At is sfricfly convex, fhe infimum inftlog At — tp fhaf defines 
s{p) is achieved af no more fhan one poinf t* Il42l . Since if is also differenliable, fhen if and only if fhe 
infimum is achieved af t*, we have 




t* 


= P- 


®They are Perron-Frobenius eigenvectors, so they are positive and can never be orthogonal. 
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If there is no such t*, then s{p) = —oo. This will he the case if p < pmin or p > pmax- Suppose, 
however, that p e (pmin, Pmax)- By the intermediate value theorem, there must he some t* for which 


Tt log A; 




= p. Then s{p) = log Af — t*p. It remains to prove nonnegativity. 


We use the following alternate expressions BH, p. 367] for pmin and pmux- 


1 


L 


Pmin= min - > logPi.,i.^, 

selt-avoidmg loops L/ ' 

1<L<M 

1 ^ 

Pmax= max -'^logPi. 

selr-avoidmg loops L/ 


(37) 


(38) 


where the suprema are over self-avoiding loops that obey the topology induced hy the sparsity of P, 
so each ii,..., is unique, / 0, and we use the convention that iL+i = ^i-) Let i],..., he 

the self-avoiding loop achieving the maximum in (|38] ). Define the matrix B as follows: every transition 
in the maximal loop is given the same value as in (i.e., = p*. j.,..., = p** *•), 

and every other entry is set to 0. On an elementwise basis, then, > B. If L < M, then B is not 
irreducible, but the Perron-Frobenius theorem still guarantees us that it has a real eigenvalue Ama.x(P) 
equal to its spectral radius Il30l . It is not hard to verify that taking powers of B eventually results in a 
constant multiple of a diagonal 0 — 1 matrix: 


= p\. 


•pk**+iL> 


= exp(fLpmax)-D. 


(39) 


Here, the diagonal entries of D associated with the indices are 1, and the others are all 0. 

Now if we let v be an eigenvector of B associated with the eigenvalue Amax(-B) whose only nonzero 
entries are those associated with the indices we have 

B^v = Amax(-B)^r;. (40) 

Combining this with (l39l ). we obtain 


Amax(-B) = exp(fpmax) 

Since the Perron-Frobenius eigenvalue Amax(') is a monotonic function of the matrix entries BTl . we 
have log A* > log Amax(-Bi*^) = fpmax for every t. Now since log \t — fpmax > 0 for all t, we must have 
s(pmax) > 0. A similar argument shows that s(pmin) > 0 as well. It then follows from the concavity of 
s{p) that s(p) > 0 on [pmm ) Pmax] ■ 


(2) Any proper convex function is continuous on the interior of its effective domain, so — s(p) is 
continuous on (p min ,P max ), and thus s(p) is as well. Since —s{p) is the Legendre-Fenchel transform 
of log Xt, which is itself a convex function, it must be lower semicontinuous. So s(p) must be upper 
semicontinuous, and therefore it is continuous from above at pmin and pmax- 


(3) Since log Xt is strictly convex (remember, we are not considering regular graphs here), there is at 
most one point that achieves the infimum inf^ {At — tp} that defines s(p). We showed earlier fhat as 
long as p G (p min ,P ma x), thcrc is exactly one such point. Another basic result in convex analysis ll42l 
Theorem 11.8] tells us that s(-) is then differentiable at p, and in particular —s'{p) = tp, where tp is the 
argument of the minimum. Since log At is differentiable, we also have that 

4: log 


t=t, 


= P- 
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Thus —s'{p) is the inverse function of ^ log A* as claimed. Since log Xt is strictly convex, its derivative 
is one-to-one, and thus so is s'{p). 

(4) We know that s{p) = log Xt^ — tpp, where tp is the value of t at which log Xt = p, if such a value 
exists, and s'{p) = —tp', otherwise s{p) = —oo. Using the expression for the derivative from the proof of 
Proposition [3l and the fact that the left and right eigenvectors of P are ai = tt and bi = 1, respectively, 
we have that 

d 7r^[(log P) o P]1 

dt * t=l TT^Pl 

i j 
= -H. 


Meanwhile, Ai = 1, so ^ log A 


t=i 


= -H. So s{-H) = logl - 1 • (-P) = H, and s'{-H) = -1. 


The same argument gives us s(po) and s'(po)> only without the nicely-interpretahle values. 


D. Proof of Theorem |2] 

To prove the statement of the theorem, we need to show that the upper hound 

limsup^logQAr(P) < inf 

N^oo JX (p,C)eB 

holds almost surely for every closed set P C M^, and the lower hound 

^ <3jv(^) > - inf Hp, 0 

Af^oo jV (p,^)et/ 


(41) 


(42) 


holds almost surely for every open set U cMf. We will use an argument parallel to Dorlas and Wedaged- 
era’s for the random energy model with an external field ll43l . Let A = |(/3,0 • h{p) + ^ < logAo| 
he the effective domain of If,-), i-e. the set on which it is finite. It can also he written as Al = 
I (Pj 0 • l?l ^ union of hypograph of the function ^ = ^/2s{p) and its reflection over 

the p axis. We know from Proposition [5] that s{p) is nonnegative and concave on [pmin, Pmax]- Since 
is a concave and increasing function, we have that Al is a convex set. A notional illustration of A was 
shown in Figure [3] 

We will he able to huild up the result for general sets hy studying the behavior of a few classes 
of primitive sets. Consider a box C = [p, p -|- 5] x ^ 5] with sides of length 5; suppose first 

that it is entirely outside of A. By definition, Qn{C) = : ^logP(s) G [p, P + <)], G 

+ <^]}- So Qn{C) is a binomial random variable with parameters #P'^Q]v([P’P + 

\/^ exp(—This means that 


EQn{C) — Q]v([P) P + '^]) 


N 

^ Jr 



Jsj 

exp(- 7^)dx 


(43) 


and 


var(QAr(C')) 


1 


Q]v([P) P+'^]) 



exp( 


Nx^ 

2 
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Now for any e, we can choose N' large enough so that for every N > N', 


P{Qn{C) > 0) = P{#V^Q^{C) > 1) 

<E{P^Qn{C)) 

I AT r^+S ATj.2 

= #'P^Qn{[p,P + ^])\J ^ exp( —)dx 


< exp {N [log Ao + e]) exp 




inf /i(r) 

r&[p,p+S\ 


= exp 



inf 

r&[p,p+S\ 

xe[C,?+<5] 


h{r) + 


X 


+ log Ao + 2e 


(44) 

(45) 


(46) 




— inf 
a:eK,€+<5] 


2 


+ log 5 + - log — 


^0. (47) 


Here, (1441) is because there is a discrete numher of paths, (|45] ) is the Markov inequality, and (l46l ) is due 
to (|43]) . This quantity converges to 0 because the coefficient on N in the exponent is guaranteed to be 
negative for small enough e because C is entirely outside of A. We have merely proven convergence 
in probability, but since the probability goes to zero exponentially fast, the Borel-Cantelli lemma tells 
us that with probability 1, there is an such that for every N > N', we have Qn{C) = 0, so 
limAT^oo jf^ogQN{C) = —oo almost surely. 

If instead we consider the half-planes C = {(/?, 0 • ^ log Aq + 1} or C = {(p, 0 • ? < 

—y/2 log Ao — 1}, we can use the same argument (replacing the Gaussian integrals in (|4^ with standard 
Gaussian tail bounds, and using the fact that C is outside of A) to show that limTv^oo \ogQN{C) = 
—oo almost surely for these sets as well. The half planes C = {{pA) '■ P > Pmax + 1} and C = {{pA) '■ 
P < Pmin — 1} also Contain no states for large enough N due to the definitions of pmin and pmax> so 
again limTv^oo log Q7v(C') = — oo almost surely. 

Now, suppose we have a box G = [p, p + (5] x [^, ^ + (5], but this time it intersects the set A. By 
Chebyshev’s inequality we know that for any e, there is an N' such that for N > N', 


p{\Qn{C) - EQAr(C)| > mQNiC)) 

1 var(Qjv(G)) 

- (EQ^(C))2 

1 / / fj Nr^ \ 



log Ao — e — e — inf Ii (r) 

x^' 

— inf — 

\ 

re[p,p-|-<S] 

xeK,?-r<5] 2 


- log 5 



By choosing e small enough, we can guarantee that this probability decays exponentially in N. Thus, 
by the Borel-Cantelli lemma limTv^oo 1 with probability 1. Because log(-) is continuous at 1, 

this gives us limTv^oo ;^logQAr(G) = limAr^oo logE(5jv(G). Using (l43l) . we can compute 

lim — logQAr(G) = — inf |-fo(^) + — 1 = — inf I(r,x), 
n^ooN ^ (p,U6cl ^ 2/ (p,c)ec 

almost surely. 

Using these primitives, we can prove the large deviation property directly. We start with the upper 
bound. Suppose H is a closed set entirely outside of the effective domain of /(•,•), Le. B Cl A = Let 
d{B,A) be the distance between the set B and A. Then if we choose some 6 < d{B,A)/V2, the set 
B can be covered by a finite number of (5 x 5 boxes that are entirely outside of A plus possibly one or 
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more of the half-planes descrihed above: B C lj^=i where each Bi is one of the primitives descrihed 
above and BiCi A = ^. Then we have 

1 1 ^ ^ 

limsup —logQAr(5) < limsup — log Y] QAr(-B£) 

N^oo N^oo ^ 


< lim — logL + limsup —log(max(5Ar(5£)) 
W —>00 iV iV £ 


= — 00 , 


(48) 


almost surely, since the maximum is over just a finite number of sets. 

Next suppose i? is a closed set that is not entirely outside the effective domain: B Cl A A Let 
b = inf(p ^)g 5 I{pA)- Because I is continuous inside A, for any e, we can choose a 5 and cover B with 
the primitive halfplanes and a finite number of boxes of width 6 such that, for each square (and each 
halfplane, trivially) B^, we have inf(p g)gB^ I{pA) ^ b — e. We have 

1 1 ^ ^ 
limsup —logQAf(-B) < limsup — log QAf(.B£) 

N^oo N N^oo W 


< lim — logL + limsup —log(max(5Ar(5£)) 

Af —>00 iV jv^oo N ^ 

= max I lim sup ^ log(QAr(S^)) 

^ I N^oo N 

<-{b-e). 

Since e can be made arbitrarily small, we have 

limsup ^ log( 57 v(S) < - inf I{pA)- (49) 

N^oo ^ 

Combining this with (1481 ) gives us the large deviation upper bound for any closed set B. 

Now we must prove the large deviations lower bound (l4^ . Let U be an open set. First, suppose 
U A = ^, meaning that the set is entirely outside the region. Then the lower bound is trivial: it 
amounts to proving that liminfTv^oo -^^ozQn{U) > — 00 , which is obviously true. So let us assume 
that U A A Lor ^ny e > 0, there is a square C of width 5 contained entirely within U such that 
inf(^ 3,)g(^/(r, x) < inf(^ 2,)gf/ x) + e, by the following argument. First, note that the infimum must 
be achieved on the interior of If the infimum is achieved at a point {p* A*) on the interior of U, then 
we can easily just draw a box C around it that is small enough to fit in U, and it must have the same 
infimum. If on the other hand the infimum is achieved on fhe boundary of U, the continuity of I{p,C) 
means that we can choose a small open neighborhood around (p*A*) in which I{pA) < ^(p*,^*) + e. 
This neighborhood must intersect with U since it is centered on a boundary point, and the intersection 
must be an open set since both sets are open. Then we can choose a small box C that fits inside the 
intersection, and again we must have that inf(^ /(r, x) < inf(^ /(r, x) + e. 

Using our result for boxes, we have 

^ Qn{U) > lim inf ^ log Qn{C) 

N—^OO I\ N^OO I\ 

= — inf I(r, x) 

(r,x)eC 

> inf L(r,x) - e, 

{r,x)(^U 


^To see this, note that the boundary points of A satisfy 4- s{p) = 0, in which case I{pA) = logAo is the maximum 

possible value of 1, or either p = pmin, |^| < s(pmin) or p = pmax, |CI < s(Pmax), in which case the concavity of s(p) tells us 
that we can decrease I by moving into the interior of A. 
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almost surely. Since this holds for any e, the lower hound is proved. 


E. Uniform integrability of the free energy density 

In this appendix, we show that the sequence of random variables 

Xtv = log ( XI exp(/3a:s)^ , = 1, 2,... (50) 

is uniformly integrable. Our arguments will closely follow those in Olivieri and Picco ||44l, who showed 
that the free energy density of the standard random energy model ll37l is uniformly integrahle. We start 
hy recalling the definition of uniform integrahility: 

Definition 3: A sequence of random variables {X 7 v}jv>i is uniformly integrable if 

lim sup E(l|x„|>a|XAr|) = 0, (51) 

a^ooN>No ^ ) 


for some Nq > 0. 

To proceed, we first note that, by the definition of pmin in (l20l) . there exists some Nq such that 
P{s) > exp(2A^/?niin) for all N > Nq. (Note that P m^-n is negative, so 2/?niin is actually less than Pmin-) 
Using this inequality and the fact that XlseP" ~ 1-’ bound the random variable Xn in (l50l) 

on both sides as 

2pmin + max Xs < Xn < y-j max Xs- (52) 

N seP" N seP" 

Now take any a > 1. We can split the expectation in (ISTI) into two parts and apply (l52l) : 

< lE(lf max,x.>a[|:maxx«]) + E [-2p„,i„ - Amaxx,]) 

cx> 

< X + l)A’(maxXs > aKN/f3) 

K=1 

oo 

+ Q:(iT + l)P(maxXs < —aKN/fi — 2p^i^N/(3), (53) 

K=l 

where to reach (l5^ we have simply decomposed the integrals corresponding to the expectations into a 
sum of integrals from Ka to (Ar + l)aforiT = l,2,..., and bounded each one. 

Let us consider the first probability expression in (l5^ . Defining $(•) as the standard Gaussian cumu¬ 
lative distribution function, and exploiting the fact that the ensemble {xg} is i.i.d., we have 


P(maxxs > aKN//3) = 1 — p(^Xs < aKN/(3, for all s £ P^'j 


\ #P^ 

= 1 - ( 1 - ^{-aKs/N/fi)] 

< fiP exp 


2^2 


(54) 


where in reaching (l54l) we have used the inequality (1 — x)^ > 1 — Kx for any positive integer K and 
any x < 1, and applied the standard Gaussian tail bound ^{—t) < exp(—f^/2) for f > 0 (see, e.g., |[2^ 
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p. 445]). Recall that = exp(iVlog Aq + o{N)). Thus, for all sufficiently large N and sufficiently 

large a, we can hound the first term on the right-hand side of (15^ as 


CX) 

a{K + l)P(maxXs > aKN/(5) < 

K=1 


oo 

a{K + 1) exp log Aq — 

K=1 


a^K^N\ 

2/32 ) 


< aXl ^{K + 1) exp ( - , 

K=i 


which converges to zero as a 00 . Similar hounds allow us to reach the same conclusion for the second 
term on the right-hand side of ([53]). It then follows that the uniform integrahility condition (l5Ti ) holds 
for the sequence of random variables in (l50l) corresponding to the free energy density. 
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